Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset

OBJECTIVE: Hyperuricemia has become the second most common metabolic disease in China after diabetes, and the disease burden is not optimistic. METHODS: We used the method of retrospective cohort studies, a baseline survey completed from January to September 2017, and a follow-up survey completed from March to September 2019. A group of 2992 steelworkers was used as the study population. Three models of Logistic regression, CNN, and XG Boost were established to predict HUA incidence in steelworkers, respectively. The predictive effects of the three models were evaluated in terms of discrimination, calibration, and clinical applicability. RESULTS: The training set results show that the accuracy of the Logistic regression, CNN, and XG Boost models was 84.4, 86.8, and 86.6, sensitivity was 68.4, 72.3, and 81.5, specificity was 82.0, 85.7, and 86.8, the area under the ROC curve was 0.734, 0.724, and 0.806, and Brier score was 0.121, 0.194, and 0.095, respectively. The XG Boost model effect evaluation index was better than the other two models, and similar results were obtained in the validation set. In terms of clinical applicability, the XG Boost model had higher clinical applicability than the Logistic regression and CNN models. CONCLUSION: The prediction effect of the XG Boost model was better than the CNN and Logistic regression models and was suitable for the prediction of HUA onset risk in steelworkers.


Introduction
Hyperuricemia (HUA) is a metabolic disorder disease that develops due to abnormal purine metabolism, resulting in elevated serum uric acid (SUA) concentrations [1]. A 2014 meta-analysis covering 16 provinces, municipalities, and autonomous regions in China showed that the prevalence of HUA in China was 13.3% (19.4% for men and 7.9% for women) [2]. Another meta-analysis in 2021, which included 2,277,712 subjects, showed that the prevalence of HUA had increased to 16.4% (20.4% for men and 9.8% for women) [3]. Previous studies have shown that the prevalence of HUA in China has doubled in the last 20 years and has become another public health problem of concern after diabetes [4]. Worldwide, the burden of gout has increased in 195 countries and regions, especially in developed countries and regions [5]. HUA is not only an early stage of gout but also an independent risk factor for coronary heart disease, hypertension, diabetes, and chronic kidney disease [6], which seriously endangers human health.
The steel industry is a pillar industry of the Chinese economy and directly employs as many as two million people. The health status of the workers is directly related to the development of the Chinese steel industry. It has been pointed out that steelworkers are exposed to occupational hazardous factors such as shift work, high temperature, and noise 2 of 15 for a long time, and also have unhealthy habits such as smoking, alcohol consumption, and a high-salt diet, which cause or affect the risk factors of HUA differently from the general population [7]. Therefore, there is an urgent need to develop new risk prediction models for steelworkers' morbidity, which can be used to improve the quality of life and health status of steelworkers.
Logistic regression is a traditional prediction model commonly used in the medical field and is widely used for a variety of disease predictions because of its clear parameter meaning and easy-to-understand outcome metrics. The convolutional neural network (CNN) is a feedforward neural network with a deep structure that is good at mining local features of data and extracting global training features and classification and has some advantages that traditional techniques do not have [8]. XG Boost, known as eXtreme gradient boosting, achieves classification by iterative computation of classifiers, and the addition of its regular term ensures the model's robustness and reduces the time to process features because it was good at handling missing data [9]. We established the above three HUA morbidity risk prediction models based on the medical examination data information of more than two thousand steelworkers and compared their prediction effects, aiming to select the optimal model and provide a theoretical basis for the health management of this special occupational group.
At present, the popularization of HUA in China is still insufficient, the prevention and treatment situation is not optimistic, and the awareness rate and cure rate of HUA among patients are low [10,11]. Therefore, screening risk factors affecting HUA to establish prediction models, early identification, detection, and intervention of HUA patients has great social value to prevent and control the development of HUA and reduce the burden of the disease.

Study Design and Participants
The present study was a retrospective cohort study, relying on the Chinese National Key Research and Development Program "Beijing-Tianjin-Hebei Regional Occupational Population Health Effects Cohort Study", which completed the baseline survey from January to September 2017 and the follow-up survey from March to September 2019. A total of 2992 steelworkers were included in the study, and the inclusion criteria for the study population were formal employees of the unit; more than 1 year of service; non-HUA patients at the time of the baseline survey; and voluntary signing of the informed consent form. Exclusion criteria were age > 60 years; and those with incomplete information. The study was reviewed and approved by the Ethics Committee of North China University of Technology (approval number: 16004).

Data Collection and Preprocessing
The subjects of this study were workers in the production department of Tangshan Iron and Steel Group who participated in the health examination, and all information was obtained from the baseline and follow-up surveys of the Beijing-Tianjin-Hebei cohort, including questionnaires, physical examinations, and laboratory examinations. The final data set was randomly divided into the training set (70% of observations) and the validation set (30%).
The questionnaire for the survey was developed by our team, one-on-one interviews were conducted by professionally trained PhD and MSc students from the School of Public Health of North China University of Technology to the workers of the steel enterprise.
Physical examinations were conducted by trained professional medical examiners according to standard testing methods for height, weight, blood pressure, and other indicators for workers in this enterprise.
For laboratory testing, fasting blood and morning urine were collected by the medical examination hospital before 9:00 a.m. daily and sent to the laboratory department of the medical examination hospital for uniform blood biochemical testing using a Myr-iad automatic biochemical analyzer (BS-800). The test indexes included fasting plasma glucose (FPG), uric acid (UA), total cholesterol (TC), triglyceride (TG), high-density lipoprotein cholesterol (HDL), triglyceride (TG), lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), creatinine (Cr), urea nitrogen (BUN), etc.

Definition of HUA
According to the Practice Guidelines for the Diagnosis and Management of Hyperuricemia in Renal Diseases in China (2017 edition) [12], developed by the Nephrologist Branch of the Chinese Physicians Association, blood uric acid ≥ 420 µmol/L in men and ≥360 µmol/L in women are being treated for HUA during the follow-up survey.

1.
Hypertension: According to the classification criteria of the Chinese Guidelines for the Prevention and Treatment of Hypertension 2018 Revision [13], systolic blood pressure ≥ 140 mmHg and/or diastolic blood pressure ≥ 90 mmHg, or a previous history of hypertension and current use of antihypertensive drugs, were defined as hypertension; 2.
Diabetes: According to the classification criteria for glucose metabolic status in the Chinese guidelines for the prevention and treatment of type 2 diabetes mellitus (2020 edition) [14], fasting blood glucose ≥ 7.0 mmol/L, or a previous history of diabetes currently undergoing diabetes treatment was defined as diabetes mellitus; 3.
Smoking index: SI = number of cigarettes smoked per day × number of years of smoking [15]. The current study divided the smoking index into 3 groups according to the median: group 0 (0), group 1 (1~), and group 3 (300 and above); 4.
The way of defining cumulative noise exposure, cumulative dust exposure, cumulative heat exposure, and cumulative days of night shift was detailed in the published article of the subject group [16]; 6.
Physical exercise: more than three times a week, no less than 30 min each time; 7.
Dyslipidemia: according to the Chinese guidelines for the prevention and treatment of dyslipidemia in adults (revised version 2016) [17], serum total cholesterol ≥ 6.2 mmol/L, and/or triglycerides ≥ 2.3 mmol/L, and/or LDL cholesterol ≥ 4.1 mmol/L, and/or HDL cholesterol < 1.0 mmol/L, a previous history of hyperlipidemia, or the current use of lipid-lowering drugs was defined as dyslipidemia; 9.
Physical activity: The physical activity of workers was investigated using the International Physical Activity Questionnaire (IPAQ) (long-volume version) [18], and an overall weekly force activity level < 600 MET-min/w was defined as low-intensity operations, an overall weekly force activity level ≥ 600 MET-min/w was defined as medium-intensity operations, and weekly overall force activity level ≥ 3000 METmin/w was defined as high-intensity operations; 10. Occupational tension: The revised Chinese Work Content Questionnaire [19] (JCQ) was used to assess occupational stress, using the ratio of job requirements to degree of work autonomy (D/C ratio), with a D/C ratio > 1 indicating occupational stress and a D/C ratio ≤ 1 indicating no occupational tension; 11. Sleep quality: Assess the sleep quality of steelworkers according to the internationally accepted Athens insomnia scale [20] (AIS), which is divided into no sleep disorder (overall score ≤ 6) and insomnia (total score > 6) according to the score; 12. DASH score: The DASH dietary model (Dietary Approaches to Stop Hypertension) encourages the intake of five major food groups (fruits, vegetables, nuts and legumes, low-fat milk, and whole grains) to be positively scored; the higher the frequency of intake, the higher the score. The three major food groups restricted by the DASH model (sodium-containing foods, red and processed meats, and sweetened beverages) were negatively scored;r the more frequently they were consumed, the lower the score.

Sample Size Calculation
The sample size calculation method for developing a clinical prediction model proposed by Richard et al. was used [21].
To ensure that the model could accurately predict the mean of the outcome events, the prevalence of hyperuricemia θ approximately 12% [22] was reviewed in the literature, and the margin of error δ was set at 0.05, which was calculated to require at least 144 study subjects.
In order to control the minimum mean error of all individual prediction values, the mean absolute error MAPE was set to 0.05, the expected shrinkage rate R CS 2 was set to 0.1, and the predictor variable P was about 15, which was calculated to require at least 433 study subjects.
To ensure that the expected shrinkage rate was 10% and reduce model overfitting, S was 0.9, the study variable P was about 15, and it was calculated that at least 1274 cases of study subjects were required.
To ensure that the difference between the developed model and R CS 2 optimization adjustment value was minimized, R CS 2 in Equation (4) was 0.1, maxR CS 2 was 0.48, and S was calculated to be 0.81, which was calculated to require at least 600 study subjects.
It was calculated that at least 1274 cases were needed to establish the model sample, and a total of 2992 cases were included in this study. The sample size met the needs of the study.

Model Building
The current study consisted of two main phases: (1) variable screening and (2) model development. We used LASSO regression for variable selection, and we screened the significant variables among 54 clinical characteristics by compressing the coefficients to achieve the effect of variable screening. The code for the LASSO regression implementation is shown in Supplementary Material S1. Logistic regression models, CNN models, and XG Boost models were then developed based on the selected variables and literature review.

Logistic Regression Model
The logistic regression model was built using the Sklearn package of python 3.6. The code for the logistic model implementation is shown in Supplementary Material S2.

CNN Model
The CNN model was built using the Numpy package, and the sigmoid function was used as the excitation function. The code for the CNN model implementation is shown in Supplementary Material S3.

XG Boost Model
The XG Boost model was built using the Sklearn package, using the sigmoid function as the excitation function and the BCE (Binary Cross Entropy) binary cross entropy as the loss function. The code for the XG Boost model implementation is shown in Supplementary Material S2.

Model Evaluation
The prediction effectiveness of the model was evaluated in terms of discrimination, calibration, and clinical applicability. The discrimination index included sensitivity, specificity, Youden index, ROC curve, and area under the curve. The calibration index includes the Brier score, Log loss, and calibration curve. Clinical applicability was evaluated by DCA graphs.

Statistical Analysis
An Excel 2010 database was established based on the questionnaire and physical examination data to screen the risk factors for HUA in steelworkers, and a prediction model was established based on the screened variables. Count data were described as rates or composition ratios, and the χ 2 test was used for comparison between groups; ordinal data were described as rates or composition ratios, and the Kruskal-Wallis test was used for comparison between groups. SPSS 26.0 and Python 3.9 statistical software were used. The test level α was set at 0.05, and both two-sided tests were used.

Quality Control
Design phase: review the literature and consult experts to modify and improve the subject scheme; data collection stage: investigators were uniformly trained. Doublechecking of data entry was used, and manual and computerized checking of data entry and logical error checking were performed to ensure the authenticity of the data; Data analysis stage: randomly selected training set and test set.

Study Population
A total of 4518 steelworkers participated in the occupational health screening, removing 989 HUA patients, 385 missed visits, and 152 incomplete information from the baseline survey, resulting in a final cohort size of 2992. The study population was randomly divided into a training cohort (2094) and a validation cohort (898) in a ratio of 7:3, as shown in Figure 1.

Analysis of Study Population Characteristics
The cohort was followed up from March to September 2019 with a median follow-up time of 26 months and 465 new HUA patients and a crude incidence rate of 15.5%, including 16.31% in men and 7.58% in women. A comparative analysis of the basic characteristics of the workers in the training and validation cohorts revealed no statistically significant differences in the indicators, as detailed in Table 1.
A total of 4518 steelworkers participated in the occupational health screening, removing 989 HUA patients, 385 missed visits, and 152 incomplete information from the baseline survey, resulting in a final cohort size of 2992. The study population was randomly divided into a training cohort (2094) and a validation cohort (898) in a ratio of 7:3, as shown in Figure 1.

Analysis of Study Population Characteristics
The cohort was followed up from March to September 2019 with a median follow-up time of 26 months and 465 new HUA patients and a crude incidence rate of 15.5%, including 16.31% in men and 7.58% in women. A comparative analysis of the basic characteristics of the workers in the training and validation cohorts revealed no statistically significant differences in the indicators, as detailed in Table 1.

Variable Screening
Predictor variables were screened by LASSO regression, and 6 predictor variables were finally screened out of 54 variables, including total cholesterol, BMI, blood pressure, waist circumference, creatinine, and DASH score, as shown in Figure 2. The figure on the left was the LASSO coefficient path diagram, where each curve represents the trajectory of the coefficient of each variable, and the variables first attributed to point 0 were excluded. The figure on the right is the cross-validation curve. The mistakes were the smallest when the parameters corresponding to the dashed line were selected, and the intersection of the dashed line and the abscissa coordinates corresponded to the Lambda in the left figure. Finally, six indicators with a large impact on the study outcome were screened. Throughout the literature review, we found that smoking, alcohol consumption, and physical activity were also important influencing factors of HUA [23], so they were added together to the subsequent model development.

Multicollinearity Test
The predictor variables were tested for multicollinearity, and we found tha variance inflation factors of all variables were greater than 0 and less than 1.4, and tolerances were between 0 and 1. There was no multicollinearity problem, as shown in T 2.

Multicollinearity Test
The predictor variables were tested for multicollinearity, and we found that the variance inflation factors of all variables were greater than 0 and less than 1.4, and the tolerances were between 0 and 1. There was no multicollinearity problem, as shown in Table 2.

Evaluation of Model Effectiveness
The results of the training set of 2094 cases (70%) showed that the XG Boost model was better than the other two models in terms of sensitivity, specificity, Youden index, F1 score, AUC (95% CI), Brier score, and Log loss, respectively. The CNN model had a higher classification accuracy of 86.8%. The Logistic regression model indicators were slightly worse, as shown in Table 3, ROC curves as shown in Figure 3a.   The XG Boost model outperformed both the CNN and Logistic regression models terms of Brier Score and Log loss, the calibration curves for both the training and valid tion sets were close to the diagonal, with no serious deviation from the results. Moreove the XG Boost model performed best in terms of calibration accuracy, with the Logis Regression model coming second and the CNN model deviating more from the diagon as shown in Figure 4. The results of the validation set of 898 (30%) showed that the XG Boost model outperformed the other two models in terms of classification accuracy, sensitivity, specificity, Youden index, F1 score, AUC (95% CI), Brier score, and Log loss, respectively, as shown in Table 3, ROC curves as shown in Figure 3b. The XG Boost model outperformed both the CNN and Logistic regression models in terms of Brier Score and Log loss, the calibration curves for both the training and validation sets were close to the diagonal, with no serious deviation from the results. Moreover, the XG Boost model performed best in terms of calibration accuracy, with the Logistic Regression model coming second and the CNN model deviating more from the diagonal, as shown in Figure 4. The XG Boost model outperformed both the CNN and Logistic regression models in terms of Brier Score and Log loss, the calibration curves for both the training and validation sets were close to the diagonal, with no serious deviation from the results. Moreover, the XG Boost model performed best in terms of calibration accuracy, with the Logistic Regression model coming second and the CNN model deviating more from the diagonal, as shown in Figure 4.  The clinical decision curves for the three models are shown in Figure 5, among which the XG Boost model had the highest clinical applicability, and the logistic regression and CNN models had slightly worse clinical applicability. The nomogram of HUA risk in steelworkers was shown in Figure 6. The clinical decision curves for the three models are shown in Figure 5, among which the XG Boost model had the highest clinical applicability, and the logistic regression and CNN models had slightly worse clinical applicability. The nomogram of HUA risk in steelworkers was shown in Figure 6.
(c)I The clinical decision curves for the three models are shown in Figure 5, among which the XG Boost model had the highest clinical applicability, and the logistic regression and CNN models had slightly worse clinical applicability. The nomogram of HUA risk in steelworkers was shown in Figure 6.

Discussion
In this study, we used LASSO regression for the screening of predictor variables, and eventually screened out 6 influencing factors among 54 predictor variables. LASSO regression was an advanced variable selection algorithm for high-dimensional data, and the complexity of the model can be simplified by constructing a penalty function to complete the screening of predictor variables [24]. Compared with the traditional stepwise regression method, LASSO regression can simultaneously process all independent variables at the same time, which not only effectively controlled model overfitting, but also made the model much more stable. On top of this, we added three influencing factors of HUA among steelworkers, such as smoking, alcohol consumption, and physical activity, found through the literature review, to improve the efficiency of the study. By comparing the predictive effects of the three different models, we found that the XG Boost model was the optimal model in this study and that the XG Boost model achieved better results in three areas: discrimination (AUROC 0.806), calibration (Brier Score

Discussion
In this study, we used LASSO regression for the screening of predictor variables, and eventually screened out 6 influencing factors among 54 predictor variables. LASSO regression was an advanced variable selection algorithm for high-dimensional data, and the complexity of the model can be simplified by constructing a penalty function to complete the screening of predictor variables [24]. Compared with the traditional stepwise regression method, LASSO regression can simultaneously process all independent variables at the same time, which not only effectively controlled model overfitting, but also made the model much more stable. On top of this, we added three influencing factors of HUA among steelworkers, such as smoking, alcohol consumption, and physical activity, found through the literature review, to improve the efficiency of the study. By comparing the predictive effects of the three different models, we found that the XG Boost model was the optimal model in this study and that the XG Boost model achieved better results in three areas: discrimination (AUROC 0.806), calibration (Brier Score 0.095), and clinical applicability. Our study highlighted the value of occupational health screening data in predicting HUA, and the screening of predictor variables may provide a scientific basis for the prevention and treatment of HUA in steelworkers.
The current study showed that overweight and obesity were important risk factors for the development of HUA in steelworkers, which is similar to the findings obtained from several previous studies [25][26][27]. Some studies have shown that obesity and the development of HUA were causally related to each other and were closely associated with unhealthy dietary habits, alcohol intake, and a sedentary lifestyle [11]. On the one hand, obese people tend to eat more meat, leading to increased exogenous purine intake and causing HUA. On the other hand, obese people ingest more energy than they consume, resulting in hyper-synthesis of purines in the body, leading to increased endogenous uric acid production [28]. An analysis of the US population found that BMI was the most important modifiable risk factor for HUA, with 44% of the population attributing HUA to overweight or obesity [29]. Both previous and current studies suggested that controlling overweight and obesity was beneficial in reducing the incidence of HUA. Dietary factors were also another important factor influencing the occurrence of HUA. The DASH dietary pattern involved in the current study was originally designed and developed to control hypertension and was a dietary pattern focusing on plant-based foods and high-quality protein that not only significantly reduced blood pressure but had also been used for cardiovascular disease prevention. Regarding the effect of the DASH diet on the risk of gout, Sharan conducted a cohort study that included more than 40,000 study subjects with up to 26 years of follow-up, and their results showed that the DASH diet score was negatively associated with the risk of gout [30]. The possible mechanism for this was that the DASH diet was lower in purines, reducing the purine load in the body. In addition, the DASH diet may act by improving insulin resistance in order to reduce SUA levels [31]. The above study supported the view of the current study that the DASH dietary pattern was a protective factor for the occurrence of HUA in steelworkers. Eating more fruits and vegetables and controlling sugar intake can contribute to the primary prevention of HUA in steelworkers.
Some studies have shown that reducing smoking and alcohol consumption, and a less sedentary lifestyle, can contribute to the prevention of HUA [23]. Smoking or secondhand smoke can increase the risk of HUA and gout. The possible reason for this is that smoking can excite the autonomic nervous system and affect the metabolism of purines in the body, with the potential effect of elevating SUA. In addition, the harmful substances in tobacco can adversely affect the respiratory and circulatory systems, leading to slower blood circulation and impaired uric acid excretion [32]. Alcohol consumption was another important influencing factor in the development of HUA. Firstly, the metabolic process of ethanol in the body consumed a large amount of water, which made the SUA value high. Secondly, the metabolism of ethanol was very likely to produce lactic acid, which was excreted through the kidneys and prevents the normal excretion of uric acid [33]. A sedentary lifestyle could lead to increased uric acid due to slower blood circulation. Moderate exercise accelerates metabolism and facilitates the excretion of uric acid. Long-term moderate-intensity aerobic exercise and aerobic exercise combined with strength training could reduce SUA concentrations in HUA patients [34], and there is a positive correlation between the amount of exercise and the decrease in SUA when exercise is performed at aerobic intensity. The possible mechanism is that long-term moderate-intensity aerobic exercise may protect renal function by alleviating the inflammatory response and ameliorating renal injury through pro-uric acid-excretory protein expression. Exercise plays a direct or indirect role in reducing SUA [35].
In this study on the prediction of HUA onset in steelworkers, the XG Boost model achieved better results and was more suitable for the prediction of HUA onset risk in steelworkers. XG Boost is a classification supervision model based on multiple trees, which essentially took the sum of the predicted values of each tree as the final predicted value. XG Boost had excellent computational efficiency, predictive generalization ability, and overfitting control, making it a long-term dominant data science competition solution. Rajdeep used six different machine learning algorithms to predict obesity risk and achieved a classification accuracy of up to 97.87% for the XG Boost model [36]. Savitesh predicted the risk of pre-diabetes in children and adolescents and found that XG Boost was the best classification model with a 10-fold cross-validation score of up to 90.13%. Savitesh integrated the XG Boost algorithm into a screening tool for completing the automatic prediction of pre-diabetes [37]. Shoukun performed miner fatigue identification based on physiological indicators from ECG and EMG and found that the XG Boost model had the best accuracy and robustness with a recognition accuracy of 89.47% and AUC of 0.90, the recognition of miner fatigue based on the XG Boost model is feasible [38]. The unique ability of logistic regression to correct different prevalence rates made it widely used in medical research, but it showed the poor ability of correct classification and low sensitivity in this study. Although the classification ability of the CNN model was relatively strong, it performed poorly in calibration, possibly because it was better at dealing with image problems.
Our study has several advantages. Firstly, in the process of evaluating the sample size, we did not choose the empirical-based estimation algorithm of 10-fold EPV but used the calculation method proposed by Richard that guaranteed the expected shrinkage rate and controls the error of individual prediction values [21], which made the calculation of the sample size of the HUA onset risk prediction model for steelworkers more rigorous. Second, instead of the traditional stepwise regression method, we chose LASSO regression, which allowed for extensive variable screening in the selection of predictor variables. LASSO regression compensated for the shortcomings of stepwise regression in terms of local optimal estimation and effectively helped us in the selection of predictor variables. In addition, we added three recognized influences such as smoking, alcohol consumption, and physical activity, based on the literature review, making the development of a predictive model for HUA in steelworkers of public health significance. Third, during the development of the model, we made a comprehensive determination in terms of discrimination, calibration, and clinical applicability. Fourthly, we developed a nomogram to predict the risk of HUA in steelworkers. The nomogram was clear and intuitive. From the perspective of steelworkers, the nomogram could predict their own risk of developing HUA in the future, and from the perspective of clinicians, the nomogram could be used to quickly and accurately identify workers at high risk of HUA for targeted health education. By understanding their own risk of HUA and raising awareness of risk factors, steelworkers can change their unhealthy lifestyles accordingly and reduce the risk of illness.
Our study has certain limitations. Firstly, as data are not easily available, our study did not find a suitable dataset to externally validate the newly developed HUA risk prediction model for steelworkers. Secondly, we only used traditional machine learning algorithms and did not improve on the relevant algorithms. Therefore, in the future, we will further investigate new algorithms to improve the predictive performance of the model.

Conclusions
The prediction effect of the XG Boost model was better than the CNN and Logistic regression models and was suitable for the prediction of HUA onset risk in steelworkers.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/ijerph20043411/s1, Table S1: Variable filter forecast table; Material S1: Lasso regression code; Material S2: Log and XG code; Material S3: CNN code. Data Availability Statement: Data are available on request due to restrictions privacy. The data presented in this study are available on request from the corresponding author. The data are not publicly available due to the data being not readily available.